Conversation
Create docs/gateway.md covering architecture, prerequisites, compatible gateway implementations, setup steps, configuration options (auto-detection, explicit flags, per-deployment overrides), usage examples (curl and Python), and troubleshooting. Update docs/architecture.md with a Gateway API Integration section and link to the new guide. Update README.md with a Gateway API Integration highlight and doc link. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… routing Add support for the Gateway API Inference Extension (inference.networking.k8s.io/v1) to provide a single unified inference gateway endpoint across all providers. When Gateway API CRDs are detected in the cluster, the controller automatically creates InferencePool and HTTPRoute resources for each ModelDeployment. Controller changes: - Add gateway-api and gateway-api-inference-extension Go dependencies - Add GatewaySpec (spec.gateway) and GatewayStatus to ModelDeployment CRD - Implement gateway reconciler for InferencePool and HTTPRoute lifecycle - Add gateway auto-detection with CRD availability caching - Support explicit --gateway-name/--gateway-namespace flags - Add RBAC for inferencepools, httproutes, and gateways - Inject kubeairunway.ai/model-deployment label in all providers (KAITO, Dynamo, KubeRay) Backend/frontend changes: - Add GET /gateway/status and GET /gateway/models API routes - Add gateway status to deployment detail responses - Add GatewayStatus, GatewayInfo, GatewayModelInfo shared types - Add gateway API client methods in frontend Tests and docs: - Add gateway reconciler tests (11 tests) and detection tests (7 tests) - Add docs/gateway.md with architecture, setup, and usage guide - Update docs/architecture.md, crd-reference.md, controller-architecture.md, api.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ndpoint - Fix backend API group from inference.networking.x-k8s.io/v1alpha2 to inference.networking.k8s.io/v1 to match upstream stable API - Add required EndpointPickerRef to InferencePool with configurable --epp-service-name and --epp-service-port controller flags - Resolve gateway endpoint from Gateway.status.addresses instead of constructing invalid DNS name - Add Istio setup notes and EPP configuration docs to gateway.md - Add test for endpoint resolution from Gateway status Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Probe the model server's /v1/models endpoint to resolve the actual served model name when no explicit spec.gateway.modelName or spec.model.servedName is set. This fixes gateway routing for baked-in model images where the served name differs from spec.model.id. Resolution priority: 1. spec.gateway.modelName (explicit override) 2. spec.model.servedName (user-specified) 3. Auto-discovered from /v1/models on running server 4. spec.model.id (fallback) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add tests for resolveModelName priority chain: explicit override, served name, unreachable server fallback, no endpoint fallback - Update gateway.md with model name resolution section documenting the 4-level priority chain including auto-discovery - Fix stale comment in modeldeployment_types.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ady=False - cleanupGatewayResources now sets GatewayReady condition to False so conditions stay consistent when gateway resources are removed - When deployment leaves Running phase (Failed, Terminating, etc.), gateway resources are cleaned up if they previously existed - Add test for phase transition cleanup and condition verification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fail fast at startup if only one of --gateway-name/--gateway-namespace is set, preventing silent fallback to auto-detection - Add 60s TTL for negative CRD detection results so gateway integration self-enables if CRDs are installed after controller startup. Positive results remain cached permanently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests the full Gateway API Inference Extension integration: - Installs Gateway API CRDs, Inference Extension CRDs, and Istio - Creates Gateway resource and deploys a CPU model - Verifies InferencePool created with correct selector and EPP ref - Verifies HTTPRoute created with correct backend ref - Verifies model name auto-discovery from /v1/models - Tests actual inference routing through the Istio gateway - Tests gateway disable and resource cleanup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The gateway reconciliation may need an extra reconcile cycle after the deployment transitions to Running phase. Add a 30-attempt retry loop with 5s intervals instead of checking once. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Set model.id in test fixture so fallback model name is non-empty - Replace gateway-routed inference test with direct service test (gateway routing requires EPP which isn't deployed in e2e) - Keep gateway resource verification (InferencePool, HTTPRoute, status, conditions) as the GAIE integration test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The auto-discovery probes /v1/models on the model service, but status.endpoint.port may contain the container port (e.g. 5000) while the service exposes port 80. Look up the actual service port first, falling back to status.endpoint.port if unavailable. This specifically fixes aikit/llamacpp models where KAITO reports container port 5000 but the service maps 80→5000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The controller needs permission to read Services to look up the actual service port for model name auto-discovery. Without this, the probe used the container port (e.g. 5000) instead of the service port (80), causing discovery to fail. Also adds resolveServicePort() which looks up the service's HTTP port, preferring ports named 'http' or on 80/8080. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Install the upstream inferencepool helm chart to deploy the EPP (Endpoint Picker Proxy), then test actual inference routing through the Istio gateway instead of direct service port-forward. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The controller now automatically creates the Endpoint Picker Proxy (EPP) deployment, service, RBAC, and config when gateway integration is enabled. Users no longer need to install the EPP separately. Resources created per ModelDeployment: - ServiceAccount, Role, RoleBinding for EPP RBAC - ConfigMap with default plugins config - Deployment running the upstream EPP image - Service exposing gRPC port 9002 All resources are owned by the ModelDeployment and cleaned up automatically. EPP image is configurable via --epp-image flag. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The controller needs pods get/watch/list and leases create/get/update permissions on its own service account to avoid RBAC escalation errors when creating the EPP Role (Kubernetes prevents granting permissions the creator doesn't hold). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HTTPRoute may be created in the same reconcile cycle as the verification step runs. Add a retry loop to wait for it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pods created by providers may not have the kubeairunway.ai/model-deployment label. The controller now discovers pods via the model service's selector and patches the label onto them, provider-agnostically. Also adds pod patch RBAC and fixes EPP log label in e2e debug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…writes) The EPP watches these experimental resources even when unused. Without RBAC for them, the cache sync fails and health check returns NOT_SERVING. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The controller needs the same permissions it grants to the EPP Role, otherwise Kubernetes blocks the Role creation as RBAC escalation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The controller deploys the EPP (Deployment + Service + RBAC), but Istio-specific wiring (DestinationRule with h2c upgrade) is BYO. Apply it directly in the e2e test since this is implementation-specific. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Kind doesn't support LoadBalancer, so the Gateway never becomes Programmed. Use networking.istio.io/service-type: NodePort annotation to get a NodePort service that works in Kind. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port-forwarding to the gateway pod bypasses ext_proc. Use the NodePort service endpoint instead, accessing the node's internal IP. Also remove exclude-from-external-load-balancers label on Kind node. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
InferencePool targetPorts routes directly to pods, so it needs the container port (e.g. 5000), not the service port (e.g. 80). Look up the service's targetPort to get the actual container port. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The validation fix for extensionManager.backendResources without hooks may only be on main. Try the latest dev build. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Traffic routing through the gateway requires either: - Envoy AI Gateway controller (for backendResources support) - Istio with working ext_proc/mTLS (connection_termination in Kind) Neither works in a basic Kind cluster. The e2e tests verify all controller-side logic comprehensively. Traffic routing was validated manually on AKS. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert from Envoy Gateway to Istio. Add cloud-provider-kind to provide LoadBalancer IP assignment in Kind, which should fix the Gateway Programmed=Unknown issue. Also restores the traffic routing test using the Gateway's LoadBalancer IP directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cloud-provider-kind provides LoadBalancer IP, Gateway is Programmed, but Istio's ext_proc can't connect to EPP without mTLS. Enable sidecar injection on default namespace so EPP gets Istio proxy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explicitly tell Istio sidecar to intercept port 9002 for ext_proc gRPC traffic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With enableAutoMtls=false, the gateway proxy should connect to the EPP using plaintext gRPC without mTLS. No sidecar needed on the EPP pod. The ext_proc cluster should use h2c based on the service port name (grpc-ext-proc) and appProtocol. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per upstream GAIE chart (inferencepool/templates/istio.yaml), Istio needs tls.mode=SIMPLE with insecureSkipVerify=true to connect to the EPP. The previous h2UpgradePolicy approach was wrong. Also adds cloud-provider-kind for LoadBalancer IP in Kind. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When httpRouteRef is set, the controller skips auto-creating the HTTPRoute and uses the user-provided one. This enables custom routing logic like LoRA adapter selection, traffic splitting across model versions, and custom payload processors. The controller still auto-creates InferencePool + EPP regardless. Cleanup also respects httpRouteRef — won't delete user-provided routes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per Gateway API conventions, readiness shouldn't be a single bool. The GatewayReady condition with reason/message already captures this with proper granularity. Users should check the condition or refer to Gateway API resource status directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If gateway reconciliation fails with a CRD-not-found error (e.g. CRDs were removed), refresh the detection cache so subsequent reconciles skip gateway integration gracefully. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pin Gateway API Inference Extension CRDs to v1.3.1 instead of latest. Update Go module dependency to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Log a warning when multiple Gateways are labeled with kubeairunway.ai/inference-gateway=true, suggesting gatewayRef for explicit selection. Uses the first labeled one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BBR (Body-Based Router) is a separate deployment needed only for multi-model setups. Updated architecture diagram, added BBR section with helm install instructions pinned to v1.3.1, and clarified that single-model setups don't need BBR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Install the upstream body-based-routing helm chart with Istio provider in the e2e test. Validates the full GAIE stack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For multi-model setups with BBR, each HTTPRoute needs a header match on X-Gateway-Base-Model-Name to route to the correct InferencePool. BBR sets this header from the request body's model field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Define GAIE_VERSION in Makefile (v1.3.1) and DefaultGAIEVersion constant in gateway package. EPP image tag defaults to this version in both cmd/main.go and gateway_reconciler.go. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The header match (X-Gateway-Base-Model-Name) only works when BBR is deployed. Add a fallback PathPrefix / match so single-model setups work without BBR. With BBR, the header match takes priority. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ypes 1. Remove duplicate DeploymentConfig interface (incompatible properties broke TypeScript build — pre-existing issue also on main) 2. Derive gateway model readiness from GatewayReady condition instead of removed status.gateway.ready field 3. Restore shared/types/aikit.ts re-export file and barrel export Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add --enable-lora engine arg when adapters are specified - Add loraEnvVars helper for Dynamo LoRA env vars (DYN_LORA_ENABLED, DYN_SYSTEM_ENABLED, DYN_SYSTEM_PORT, DYN_LORA_PATH) - Inject LoRA env vars into aggregated, prefill, and decode workers - Add reconcileAdapters to create/update DynamoModel CRDs per adapter - Add cleanupOrphanedDynamoModels for adapter lifecycle management - Add DynamoModel cleanup on ModelDeployment deletion - Add RBAC marker for DynamoModel resources - Set LoRASupport: true in provider capabilities Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add LoRAAdapterSpec and AdapterStatus types to ModelDeployment - Add LoRASupport capability to InferenceProviderConfig - Webhook validation: block llamacpp+adapters, unique names, hf:// scheme - Provider auto-selection filters by LoRA support - KAITO: map adapters to inference.adapters on Workspace - KubeRay: inject --enable-lora + --lora-modules into VLLM_ENGINE_ARGS - Dynamo: --enable-lora, LoRA env vars, DynamoModel CRDs, init container for HF adapter download, modelRef for endpoint discovery - Gateway: auto-create InferenceObjective per adapter - Update Dynamo runtime images to 0.9.0 - Add unit tests for all providers and webhook - Add docs/lora-adapters.md user guide - Add sample YAML with chess LoRA adapter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LoRA Adapter Support
Adds unified LoRA adapter abstraction to ModelDeployment CRD with support across all three providers.
Changes
spec.adapters[]withnameandsource(hf:// URI scheme)loraSupportcapability for auto-selection filteringinference.adapterson Workspace CRD--enable-lora+--lora-modulesinto VLLM_ENGINE_ARGS--enable-lora, LoRA env vars, DynamoModel CRDs, init container for HF download, modelRef for endpoint discovery, updated to 0.9.0 runtime imagesTesting
unsloth/Qwen3-0.6B+lucylq/qwen3_06B_lora_mathadapterKnown Issues
hf://download via DynamoModel is async and may silently fail.file://local path loading works reliably after init container pre-downloads.TODO
/v1/lorasAPI call from provider controller instead of DynamoModel CRD for hf:// sources